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ABSTRACT 

In this article, we introduce BioMe (biologically 
relevant metals), a web-based platform for calcula- 
tion of various statistical properties of metal-binding 
sites. Users can obtain the following statistical 
properties: presence of selected ligands in metal 
coordination sphere, distribution of coordination 
numbers, percentage of metal ions coordinated 
by the combination of selected ligands, distribution 
of monodentate and bidentate metal-carboxyl, 
bindings for ASP and GLU, percentage of particular 
binuclear metal centers, distribution of coordination 
geometry, descriptive statistics for a metal ion- 
donor distance and percentage of the selected 
metal ions coordinated by each of the selected 
ligands. Statistics is presented in numerical and 
graphical forms. The underlying database contains 
information about all contacts within the range of 
3 A from a metal ion found in the asymmetric 
crystal unit. The stored information for each metal 
ion includes Protein Data Bank code, structure de- 
termination method, types of metal-binding chains 
[protein, ribonucleic acid (RNA), deoxyribonucleic 
acid (DNA), water and other] and names of the 
bounded ligands (amino acid residue, RNA nucleo- 
tide, DNA nucleotide, water and other) and the co- 
ordination number, the coordination geometry and, 
if applicable, another metal(s). BioMe is on a regular 
weekly update schedule. It is accessible at http:// 
metals.zesoi.fer.hr. 

INTRODUCTION 

Metal cations are constituents of approximately 40% of 
all proteins (1), they take part in enzymatic reactions 
and they are the essential partners in assembly of the 
functional ribonucleic acid (RNA) structures (2,3). For 



example, presence of magnesium di-cations is essential 
for the formation and stabilization of the transfer RNA 
tertiary structure (4,5). A region around a metal often 
defines the so-called 'active place' where a particular 
chemical reaction can take place. Each metal ion possesses 
unique combination of charge and rigidity of its coordin- 
ation sphere. Removal or replacement of one metal by 
another is accompanied by loss, reduction or even alter- 
ation of the enzyme catalytic potency [see for example (6)]. 
The knowledge of the metal ions environment, especially 
the electron donor types and number, is important to 
clarify how specific a metal-binding site is and how we 
can tune the desired chemical reaction. 

To the best of our knowledge, this is the only website 
that focuses on the binding sites statistical properties. 
Currently, there are several databases available for re- 
searchers to view the information on metals and metal- 
binding sites in proteins [MetLigDB (7), MESPEUS (8), 
MIPS (9), MDB (10) and COMe (11)] and RNAs 
[MeRNA (12)]. However, they are typically limited to 
simple retrieval of Protein Data Bank (PDB) structures 
for specified metal and donor atoms. 

The main contribution of BioMe is that, unlike the 
databases discussed earlier, it generates statistical reports 
for predefined or user-defined PDB subsets. This approach 
allows the users to find characteristics of particular 
metal-biding sites in a simple and straightforward 
manner. In addition, to the best of our knowledge, 
BioMe is the only website that distinguishes between 
various chain types [i.e. protein, RNA and deoxyribonu- 
cleic acid (DNA)], thus enabling the user to easily focus on 
the chain of their interest. Our website additionally 
provides information about the binuclear metal centers, 
the occurrence of certain combination of molecules (13) 
in metal ion coordination sphere and about the metal co- 
ordination geometry (14), for coordination numbers 
ranging from 3 to 14. Finally, the MySQL database 
dump of BioMe underlying database is publicly available. 
Among the existing databases, as far as we know, only 
MDB (the time of the last update was in 2003) has the 
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dump available, whereas querying MESPEUS SQL 
database is available only on request (by sending an 
email to author). 

The main motivations for setting up BioMe are expo- 
nentially growing number of three-dimensional (3D) 
structures deposited in the PDB (15), the green chemistry 
requirements for increasing efficiency of existing 
metaloenzymes and design of the new ones. This is a con- 
tinuation of our earlier work on structure of the metal- 
binding sites in proteins (13). A user-friendly interface 
enables scientists to perform their own statistics and to 
easily extrapolate the relevant data. In addition, the 
website enables the retrieval of information concerning 
natural, as well as toxic metal cations, precisely the all 
metal ions found in PDB to be tightly bound either to a 
protein or/and to a nucleic acid are considered. In com- 
parison with the other mentioned databases, BioMe 
enables queries for a multiple selection of metals and 
ligands. Furthermore, for the purpose of getting more 
general picture of the metal ions distribution in different 
types of proteins, the pre-calculated nonredundant PDB 
set is also available. 



METHODS 

BioMe underlying database is built from the 3D structures 
(PDB files) in which the metal ions are coordinated with at 
least either two donor atoms from a protein chain or al- 
ternatively one donor atom from a nucleic acid chain. The 
purpose of these constraints is try to eliminate metals 
added for the crystallization purposes. For each PDB 
entry, we have extracted the PDB code, title, information 
about the method used to derive the 3D structure, reso- 
lution and the release date. Distances between the metal 
ions and the electron donor atoms (O, N, S, CI and F), as 
well as those between the metals ions themselves, are 
calculated from the atomic coordinates. In the cases 
where either the metal ion and/or some of its electron 
donors have multiple positions within the crystal struc- 
ture, the position with higher occupancy was selected. In 
this study, we distinguish protein, RNA, DNA, 'other' 
chains and water. Protein chains and nucleic acid chains 
(RNA and DNA) are defined as chains with at least 50 
amino acid residues and five nucleotides, respectively. 

We use the distance limit of 3 A for defining the coord- 
ination bond as in (13). This high threshold was used to 
account for the natural flexibility of a metal coordination 
sphere and for the possible coordination errors in struc- 
ture determination. A metal coordination number was 
calculated by summing the number of all electronegative 
atoms within the 3 A range (O, N, S, CI and F). The cal- 
culation of the coordination geometry (14) is based on 
the geometrical pattern of the coordinating atoms L\, 
L 2 ,..., L m which is described by the list W of all bond 
angles around the metal ion M sorted in ascending order: 

W=(l(L u M,Lj)\iJe {1,2,..., «},/</). 

In the same way, a set of angle lists of ideal coordin- 
ation geometries was developed. The root mean square 
deviation (RMSD) is calculated between the angle list of 



the particular metal ion and each of the ideal geometry 
lists of the same length. The list that shows smallest 
RMSD is used as the best fitting coordination geometry 
(16). Currently, the web server calculates 22 different 
geometries for coordination numbers ranging from 4 
to 14. Ideal coordination geometries used in this work 
are presented in Table 1. 

Although in our previous work (13), we used only a set 
of most representative metals, in this study, we have taken 
into account all the metals found in the PDB structures. 

After parsing the whole PDB, we found 20 307 files with 
structures that satisfied our conditions. Among those, 
there are 43 326 protein chains, 650 DNA chains and 
734 RNA chains. However, there are a number of redun- 
dant structures. Hence, for the purpose of getting an 
unbiased picture of the metal ions distribution in 
proteins, the list of metals in the nonredundant, represen- 
tative selection of structures is also available. The repre- 
sentative set of the protein chains and the corresponding 
PDB files was extracted from the PDB (April 2012) ac- 
cording to a pre-calculated nonredundant set of 'cluster-70 
chains' downloaded from the PDB site. From the each 
cluster, we chose the best-ranked chain and checked 
whether it contains a metal cation that satisfies defined 
conditions. If such a metal was not found in the best 
ranked chain, we chose the second one on the list and 
repeated the procedure recursively until we found a struc- 
ture with a bound metal ion or reached the end of the list. 

The database is implemented with MySQL on a Gentoo 
Linux operating system (Intel Core2 Quad CPU Q6600 at 
2.40 GHz, 4 GB). 



USER INTERFACE 

The main purpose of the user interface is to create a query 
and to present the results (statistics) in an appropriate 
form. The starting page offers a large number of 
options, so users can easily tailor their queries. They can 
refine their search by type of the metal ions, the structure 
determination method (X-ray crystallography, NMR or 
both) and resolution, by the type of chain and combin- 
ation of ligands, by the coordination number, maximum 
RMSD for a metal coordination geometry (the default 
value is 15°) and by the threshold distance between the 
selected metal ions. A help page exists for each parameter. 

The search can be performed on the prepared list of all 
PDBs that fulfilled the requirements (metal ions with at 
least two donor atoms from a protein chain or at least one 
from a nucleic acid chain) and/or on the representative 
cluster-70 list. Moreover, users can narrow their search 
by specifying a list of structures. 

Currently, users can select up to 25 most representative 
kinds of ions (Mg, Zn, Ca, Fe, Na, Mn, K, Sr, Cu, Cd, Ni, 
Hg, Co, W, Os, Mo, Ba, Al, Tl, Au, Pt, Pb, V, Yb and 
Sm). However, using the publicly available database 
dump, users can perform search for any metal available 
in the PDB. 

BioMe distinguishes between five different types of 
ligands: amino acid residues, RNA and DNA nucleotides, 
water and 'others'. In the output statistics, ligands 
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belonging to the first four types are presented with their 
names, whereas those from the last group are presented 
just as 'others'. We found that using the original names 
from PDB files for the last group of ligands introduces a 
poorly legible graphical representation of the results, 



Table 1. Coordination 


geometries used in BioMe web server 


Coordination number 


Geometry 


4 


Tetrahedron 


4 


Square planar 


5 


Trigonal bipyramid 


5 


Square pyramid (tetragonal bipyramid) 


6 


Octahedron 


6 


Trigonal prism 


7 


Octahedron, face capped 


7 


Trigonal prism, square face monocapped 


7 


Pentagonal bipyramid 


8 


Dodecahedron (bisdisphenoid) 


8 


Cube 


8 


Hexagonal bipyramid 


8 


Trigonal prism, square face bicapped 


8 


Square antiprism 


8 


Trigonal prism, triangular face bicapped 


9 


Square antiprism, monocapped 


9 


Trigonal prism, square face tricapped 


10 


Square antiprism, bicapped 


12 


Cuboctahedron 


12 


Anticuboctahedron 


12 


Icosahedron 


14 


Hexagonal antiprism, bicapped 



especially when the search is performed for a large 
number of structures. Therefore, the results for them are 
available on a separate page that can be accessed by 
'Other list' button. 

Among the available results, we distinguish between 
two types of statistics: statistics associated with the 
selected metals and statistics associated with the selected 
amino acids and nucleotides. 

For each metal, seven statistics are available (i) relative 
presence of selected ligands in the metal coordination 
sphere, (ii) relative distribution of the coordination 
numbers, (iii) percentage of metal ions with the selected 
coordination number coordinated by combination of 
selected ligands, (iv) distribution of monodentate and 
bidentate metal-carboxyl bindings for Asp and Glu, (v) 
number of particular binuclear metal centers, (vi) distribu- 
tion of coordination geometries and (vii) average distances 
and standard deviations for selected metal and donor 
atoms. 

The last statistics, Statistics 8, is performed if more than 
one metal is selected. For each metal ion, it gives infor- 
mation on how often it is coordinated with each of the 
selected ligands. 

With the exception of Statistics 3 and Statistics 7, all the 
remaining statistics are presented numerically and graph- 
ically. Graphs include pie and column charts. As an illus- 
tration, Figure 1 shows a result for Statistics 1. In this 
example, the query includes Mg ions in any coordination 
number bound to any type of RNA nucleotide (in the 



Ligands - MG 

This statistic shows how atoms of certain metal ion coordinate with selected ligands. 

Ligand Metal count 

A 8606 




There are 348 PDB entries that satisfy the chosen criteria. 

| PDB list 

Figure 1. A screenshot displaying results of statistics number 1 showing the number of each type of RNA nucleotide that binds to Mg. The statistics 
represents only nucleotides in RNA chains. 
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Combination - ZN 




This statistic shows percentage of metal ions with selected coordination number coordinated by a combination of selected ligands. For 
this statistic it is necessary to use equal sign "=" for coordination number. It is important to note that 'other' ligands and waters are 
often included in coordination. 


Coord. # Combination Amount 


PDB ID 


4 3 ASP: 1 GLU: 19 


1BPN : 1LAP : 1MWO : 1PV9 : 2I0M : 3JRU : 3KQZ : 3QDH : 


4 2 ASP: 2 GLU: 15 


1CFV : 1F35 : 1KSP : 1XLL : 2AYI : 2DJW : 2QNW : 2X7Y : 2X80 : 2XJL : 2XK5 : 
3HSU : 3LKV : 303J : 


There are 22 PDB entries that satisfy the chosen criteria. 




PDB list 





Figure 2. A screenshot displaying results of statistics number 3 showing the number of the Zn ions with the coordination number 4 coordinated by 
the given combination of ligands. In the query, the following selections were made: entire PDB, Zn, Asp and Glu amino acid residues, protein chain 
type and coordination number 4. 



Dentate - ZN 

This statistic shows distribution of monodentate and bidentate atoms of selected metal ion for ASP and GLU molecules. Eg. for ASP 
results, if MONO is 5 and BI is 3, than that means that there are 5 atoms that are coordinated with one donor from carboxyl group, and 
3 atoms that are coordinated with both donor atoms from carboxyl group. 



ASP 



Monodentate 
Bidentate 



1514 
358 



GLU 



Monodentate 



756 
380 



BI 



MONO 60 



Li 



I 



There are 1241 PDB entries that satisfy the chosen criteria. 

PDB 1st 



Figure 3. A screen shot of statistic number 4 (distribution of monodentate and bidentate metal-carboxyl bindings for Asp and Glu) obtained from a 
query that included all available PDB structures, Zn atom, Asp and Glu amino acid residues, protein chain type and coordination number 4. 



RNA chain). As can be seen from Figure 1, Mg is mostly 
bound to G, followed by A and C and U base. Concurrent 
binding to G and A corresponds to binding to the shared 
G A pairs, as described in Stefan et al. (12) and references 
therein. 

Examples of the outputs for Statistics 3, 4 and 5 are 
presented in Figures 2, 3 and 4, respectively. The query 
specified Zn as a metal of interest, Asp and Glu amino 
acid residues as ligand types, protein as a chain type, 4 as a 
desired coordination number and the entire database as a 
set of structures to be searched. 

The Statistics 4 classifies the carboxyl binding as 
monodentate, in the case where only one of its O atoms 
participates in the coordination of a particular metal ion 



and bidentate in the case where both oxygen atoms are 
involved in the metal coordination. 

For the purpose of further processing, the results are 
available in the csv file format. Furthermore, for each of 
the statistics, there is a list of PDB files that satisfy the 
query with links to their entries in PDB. 

The dump of the underlying database is publicly avail- 
able in a separate window that can be accessed from the 
starting page, and it gives users the opportunity to 
perform their own analysis. The database is on a regular 
weekly update schedule. 

The user interface was built as a web application using 
GWT development toolkit (http://code.google.com/ 
webtoolkit/). Tomcat is used as the application server. 
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Bimetals - ZN 

This statistic shows how metal ion is connected with other metal ions that belong to the same chain. Eg. if Combination is FE - NI, and 
amount of that combo is 97, than that means that there are 97 atoms of FE connected with atoms of NI 



Combination 


Count 


AG 


1 


CA 


3 


CD 


6 


CO 


2 


CU 


50 


FE 


27 


K 


5 


MG 


99 


MN 


9 


NA 


9 


ZN 


442 




MS Mil MA 



There are 316 PDB entries that satisfy the chosen criteria. 

PDB list 



Figure 4. A screenshot displaying results of statistics number 5 which revealing the number of binuclear metal centers containing the Zn ion together 
with the percentage of metals that together with the zinc dication form a binuclear metal center. In the query, the following selections were made: 
entire PDB, Zn, Asp and Glu amino acid residues, protein chain type and coordination number 4. 



APPLICATION 

The statistics obtained using the presented databases may 
help in the identification and modeling of the metal- 
binding sites in the protein structures derived by 
homology modeling and in design of proteins with a 
specific affinity for a certain metal (so-called metal 
biosorbents) with potential application in the environmen- 
tal chemistry. Carefully designed and performed statistics 
could provide a clue about which metal ion and/or envir- 
onment would be the best choice for a certain reaction. 
Thus, it should also help to improve catalytic perform- 
ances of existing and aid the design of new ones. 

CONCLUSIONS 

The website allows scientists to perform a number of dif- 
ferent searches and to obtain useful information about the 
selected metal ions in biological macromolecules and their 
ligands. Users can choose to retrieve information from all 
structures available in PDB, from a nonredundant set of 
protein chains or from their own list of structures. The 
search can be performed either for proteins or nucleic 
acids, RNAs and DNAs or both. For each selection, in- 
formation about the coordination numbers, distances, 
percentage of monodentately and bidentately bound Asp 
and Glu carboxyl groups, percentage of metal ions with 
selected coordination number coordinated by combin- 
ation of selected ligands, coordination geometry and 
population of particular binuclear metal centers users 
can retrieve. We believe that BioMe will prove a 



valuable tool for all research related to metal ions in 
proteins and nucleic acids. 
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